Nonlinear Dimensionality Reduction Methods for Use with Automatic Speech Recognition

نویسندگان

Stephen A. Zahorian

Hongbing Hu

چکیده

For nearly a century, researchers have investigated and used mathematical techniques for reducing the dimensionality of vector valued data used to characterize categorical data with the goal of preserving “information” or discriminability of the different categories in the reduced dimensionality data. The most established techniques are Principal Components Analysis (PCA) and Linear Discriminant Analysis (LDA) (Jolliffe, 1986; Wang & Paliwal, 2003). Both PCA and LDA are based on linear, i.e. matrix multiplication, transformations. For the case of PCA, the transformation is based on minimizing mean square error between original data vectors and data vectors that can be estimated from the reduced dimensionality data vectors. For the case of LDA, the transformation is based on minimizing a ratio of “between class variance” to “within class variance” with the goal of reducing data variation in the same class and increasing the separation between classes. There are newer versions of these methods such as Heteroscedastic Discriminant Analysis (HDA) (Kumar & Andreou, 1998; Saon et al., 2000). However, in all cases certain assumptions are made about the statistical properties of the original data (such as multivariate Gaussian); even more fundamentally, the transformations are restricted to be linear. In this chapter, a class of nonlinear transformations is presented both from a theoretical and experimental point of view. Theoretically, the nonlinear methods have the potential to be more “efficient” than linear methods, that is, give better representations with fewer dimensions. In addition, some examples are shown from experiments with Automatic Speech Recognition (ASR) where the nonlinear methods in fact perform better, resulting in higher ASR accuracy than obtained with either the original speech features, or linearly reduced feature sets. Two nonlinear transformation methods, along with several variations, are presented. In one of these methods, referred to as nonlinear PCA (NLPCA), the goal of the nonlinear transformation is to minimize the mean square error between features estimated from reduced dimensionality features and original features. Thus this method is patterned after PCA. In the second method, referred to as nonlinear LDA (NLDA), the goal of the nonlinear transformation is to maximize discriminability of categories of data. Thus the method is patterned after LDA. In all cases, the dimensionality reduction is accomplished with a Neural Network (NN), which internally encodes data with a reduced number of dimensions. The differences in the methods depend on error criteria used to train the network, the architecture of the network, and the extent to which the reduced dimensions are “hidden” in the neural network.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)

This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...

متن کامل

Audio Visual Speech Recognition Using Deep Recurrent Neural Networks

In this work, we propose a training algorithm for an audiovisual automatic speech recognition (AV-ASR) system using deep recurrent neural network (RNN).First, we train a deep RNN acoustic model with a Connectionist Temporal Classification (CTC) objective function. The frame labels obtained from the acoustic model are then used to perform a non-linear dimensionality reduction of the visual featu...

متن کامل

مدل ترکیبی تحلیل مؤلفه اصلی احتمالاتی بانظارت در چارچوب کاهش بعد بدون اتلاف برای شناسایی چهره

In this paper, we first proposed the supervised version of probabilistic principal component analysis mixture model. Then, we consider a learning predictive model with projection penalties, as an approach for dimensionality reduction without loss of information for face recognition. In the proposed method, first a local linear underlying manifold of data samples is obtained using the supervised...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Neural Network Based Nonlinear Discriminant Analysis for Speech Recognition

Neural networks have been one of the most successful recognition models for automatic speech recognition systems because of their high discriminative power and adaptive learning. In many speech recognition tasks, especially for discrete speech classification, it has been shown that neural networks are very powerful for classifying short-time acoustic-phonetic units, such as individual phonemes....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Nonlinear Dimensionality Reduction Methods for Use with Automatic Speech Recognition

نویسندگان

چکیده

منابع مشابه

Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)

Audio Visual Speech Recognition Using Deep Recurrent Neural Networks

مدل ترکیبی تحلیل مؤلفه اصلی احتمالاتی بانظارت در چارچوب کاهش بعد بدون اتلاف برای شناسایی چهره

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Neural Network Based Nonlinear Discriminant Analysis for Speech Recognition

عنوان ژورنال:

اشتراک گذاری